-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libcontainer: init: only pass stateDirFd when creating a container #1274
libcontainer: init: only pass stateDirFd when creating a container #1274
Conversation
/cc @opencontainers/runc-maintainers |
libcontainer/factory_linux.go
Outdated
|
||
// Only get the STATEDIR if we're an init process. It's a bit late to do | ||
// anything about this if we've already brought in an fd (a racing process | ||
// could've opened the fd by now because we're in the PID namespace). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand the second sentence in this comment. Can you explain or rephrase so it can be more clear?
I think your point is to explain the effect if we set STATEDIR in exec process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If STATEDIR is set by a parent, then by the time we hit this Go code the race condition in CVE-2016-9962 could've been hit. That's what I'm trying to explain, which is why I don't attempt to close STATEDIR
if we're an init process -- we're already stuffed if we have to attempt to close the fd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are explaining the effect of the case that if we set STATEDIR for exec process right? I think your comments in newParentProcess
is sufficient and all you do is to not set STATEDIR or pass stateDirFD to exec process, so I think you don't have to explain why you do so in this late phase, which is why I'm confused by this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, alright I'll drop the comment. The reason for writing it was to ensure that nobody would think that STATEDIR
should ever be set or checked in runc exec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Fixed @hqhq's nit. PTAL, this is quite important (it extends the CVE-2016-9962 fix so that it also works for /ping @opencontainers/runc-maintainers |
Rebased. /ping @opencontainers/runc-maintainers |
If we pass a file descriptor to the host filesystem while joining a container, there is a race condition where a process inside the container can ptrace(2) the joining process and stop it from closing its file descriptor to the stateDirFd. Then the process can access the *host* filesystem from that file descriptor. This was fixed in part by 5d93fed ("Set init processes as non-dumpable"), but that fix is more of a hail-mary than an actual fix for the underlying issue. To fix this, don't open or pass the stateDirFd to the init process unless we're creating a new container. A proper fix for this would be to remove the need for even passing around directory file descriptors (which are quite dangerous in the context of mount namespaces). There is still an issue with containers that have CAP_SYS_PTRACE and are using the setns(2)-style of joining a container namespace. Currently I'm not really sure how to fix it without rampant layer violation. Fixes: CVE-2016-9962 Fixes: 5d93fed ("Set init processes as non-dumpable") Signed-off-by: Aleksa Sarai <asarai@suse.de>
LGTM |
1 similar comment
LGTM |
@cyphar @crosbymichael Can you please tag a new release candidate that fixes the security issue, so distributions don't need to apply further patches? |
@crosbymichael I'll make a proposal for a new rc release. |
Can we include #1237 in the new release, it fixes really annoying issue. |
I don't really have time to review that today -- it would have to wait until the end of the week. |
@hqhq @cyphar @crosbymichael Is there a way to get a new release out anytime soon? Tags are cheap, if #1237 is fixed, you could just release rc4. |
If we pass a file descriptor to the host filesystem while joining a
container, there is a race condition where a process inside the
container can ptrace(2) the joining process and stop it from closing its
file descriptor to the stateDirFd. Then the process can access the
host filesystem from that file descriptor. This was fixed in part by
5d93fed ("Set init processes as non-dumpable"), but that fix is
more of a hail-mary than an actual fix for the underlying issue.
To fix this, don't open or pass the stateDirFd to the init process
unless we're creating a new container. A proper fix for this would be to
remove the need for even passing around directory file descriptors
(which are quite dangerous in the context of mount namespaces).
There is still an issue with containers that have CAP_SYS_PTRACE and are
using the setns(2)-style of joining a container namespace. Currently I'm
not really sure how to fix it without rampant layer violation.
Fixes: CVE-2016-9962
Fixes: 5d93fed ("Set init processes as non-dumpable")
Signed-off-by: Aleksa Sarai asarai@suse.de